AITopics | theorem 7

Collaborating Authors

theorem 7

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

Riedl, Konstantin, Spiliopoulos, Konstantinos, Sirignano, Justin

arXiv.org Machine LearningMay-21-2026

A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the regularization parameter is selected appropriately, allowing Newton's method to converge more quickly for data with high-frequency components. Mathematical challenges that need to be addressed in our analysis include the implicit parameter update of the Newton method with a potentially indefinite Hessian matrix and the fact that the dimension of this linear system of equations tends to infinity as the NN width grows. This complicates deriving the training dynamics in the overparameterized limit as well as proving the convergence of the finite-width dynamics thereto. The analysis identifies a scaling formula for selecting the regularization parameter, which we show can vanish at a suitable rate as the number of hidden units becomes larger. We prove that, for sufficiently large numbers of hidden units, the regularized Hessian remains positive definite during training and the Newton updates for individual NN parameters converge to zero, showing that the model behaves as a linearization around the initialization.

artificial intelligence, convergence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.08352

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada (0.68)

Genre: Workflow (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Causal Inference with Categorical Unobserved Confounder via Mixture Learning

Saha, Aytijhya, Bates, Stephen, Shah, Devavrat

arXiv.org Machine LearningMay-20-2026

Unobserved confounding is a fundamental challenge for estimating causal effects. To address unobserved confounding, recent literature has turned to two different approaches -- proxy variables and the use of multiple treatments. The first approach, commonly referred to as proximal causal inference, requires proxies to be assigned to specific asymmetric roles: treatment-inducing proxies (negative control exposures), variables that act as common causes of the treatment and outcome, and outcome-inducing proxies (negative control outcomes). In practice, however, identifying variables that satisfy these asymmetric roles can be difficult depending on the application domain. The second approach, commonly referred to as the ``Deconfounder," deals with multiple conditionally independent treatments. There has been limited progress towards developing a consistent estimation method for this setting. As the primary contribution of this work, we establish that causal effects are identifiable in both settings when the unobserved confounder is categorical under suitable conditions. Our approach builds on a mixture learning perspective: we show that the underlying confounding structure can be recovered by identifying the corresponding mixture distribution. We propose an estimation procedure based on tensor decomposition, which allows consistent recovery of the latent structure and comes with non-asymptotic guarantees. Simulation studies and real data experiments demonstrate that the proposed method performs well even with limited data.

artificial intelligence, assumption, machine learning, (15 more...)

arXiv.org Machine Learning

2605.19006

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Add feedback

08f36fcf88c0a84c19a6ed437b9cbcc9-Supplemental.pdf

Neural Information Processing SystemsMay-1-2026, 01:40:53 GMT

artificial intelligence, graph, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Many Faces of Adversarial Risk

Neural Information Processing SystemsApr-25-2026, 22:30:57 GMT

Adversarial risk quantifies the performance of classifiers on adversarially perturbed data. Numerous definitions of adversarial risk--not all mathematically rigorous and differing subtly in the details--have appeared in the literature. In this paper, we revisit these definitions, make them rigorous, and critically examine their similarities and differences. Our technical tools derive from optimal transport, robust statistics, functional analysis, and game theory. Our contributions include the following: generalizing Strassen's theorem to the unbalanced optimal transport setting with applications to adversarial classification with unequal priors; showing an equivalence between adversarial robustness and robust hypothesis testing with -Wasserstein uncertainty sets; proving the existence of a pure Nash equilibrium in the two-player game between the adversary and the algorithm; and characterizing adversarial risk by the minimum Bayes error between a pair of distributions belonging to the -Wasserstein uncertainty sets. Our results generalize and deepen recently discovered connections between optimal transport and adversarial robustness and reveal new connections to Choquet capacities and game theory.

adversarial risk, artificial intelligence, machine learning, (21 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Industry:

Leisure & Entertainment > Games (0.54)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The Many Faces of Adversarial Risk

Neural Information Processing SystemsApr-25-2026, 22:30:53 GMT

adversarial risk, artificial intelligence, machine learning, (21 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Industry:

Leisure & Entertainment > Games (0.54)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ATheory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning (Supplementary File)

Neural Information Processing SystemsApr-25-2026, 09:01:34 GMT

This supplementary document contains more additional experimental details and the technical proofs of convergence results of the NeurIPS'21 submission entitled "ATheory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning". It is structured as follows. In Appendix A, we provides more experimental details, including training algorithm, network architecture, optimizer details, loss construction and training cost of SANE. Appendix B presents the proof and details of the main results, namely, Theorem 1, in Section 2, which analyzes the generalization performance of MoCo. Next, Appendix C introduces the proof roadmap and details of the main results, i.e.

artificial intelligence, augmentation, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2d290e496d16c9dcaa9b4ded5cac10cc-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 07:24:08 GMT

This appendix contains a proofs of the results in the main text and further analysis on the two FIM estimators ˆI1(θ)and ˆI2(θ). In particular, Appendix C presents an analysis of how the FIM estimators and their covariance tensors change under reparametrization. Appendix D presents element-wise bound alternatives to those presented in Section 3.2. Appendix E explores various results using alternative norms to the Frobenius norm results of the main text. Appendix F presents an analysis on taking a linear combination of the two FIM estimators.

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

258be18e31c8188555c2ff05b4d542c3-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 04:02:03 GMT

activation function, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness

Neural Information Processing SystemsApr-24-2026, 08:55:48 GMT

Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d.

artificial intelligence, machine learning, optimization, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

PRIM-cipal components analysis

Liu, Tianhao, Díaz-Pachón, Daniel Andrés, Rao, J. Sunil

arXiv.org Machine LearningApr-20-2026

EVEN supervised learning is subject to the famous NoFree Lunch Theorems [1]-[3], which say that, in combinatorial optimization, there is no universal algorithm that works better than its competitors for every objective function [4]-[6]. Indeed, David Wolpert has recently proven that, on average, cross-validation performs as well as anti-crossvalidation (choosing among a set of candidate algorithms based on which has the worst out-of-sample behavior) for supervised learning. Still, he acknowledges that "it is hard to imagine any scientist who would not prefer to use [crossvalidation] to using anti-cross-validation" [7]. On the other hand, unsupervised learning has seldom been studied from the perspective of the NFLTs. This may be because the adjective "unsupervised" suggests that no human input is needed, which is misleading as many unsupervised tasks are combinatorial optimization problems that depend on the choice of the objective function. For instance, it is well known that, among the eigenvectors of the covariance matrix, Principal Components Analysis selects those with the largest variances [8]. However, mode-hunting techniques that rely on spectral manipulation aim at the opposite objective: selecting the eigenvectors of the covariance matrix with the smallest variances [9], [10]. Therefore, unlike in supervised learning, where it is difficult to identify reasons to optimize with respect to anti-cross-validation, in unsupervised learning there are strong reasons to reduce dimensionality for variance minimization. D. A. D ıaz-Pach on and T. Liu are with the Division of Biostatistics, University of Miami, Miami, FL, 33136 USA (e-mail: ddiaz3@miami.edu,

artificial intelligence, machine learning, principal component, (18 more...)

arXiv.org Machine Learning

2604.15538

Country: